The use of shibboleth words for automatically classifying speakers by dialect

نویسندگان

  • A. W. F. Huggins
  • Yogen Patel
چکیده

Real-world applications using speech recognition must perform well over a range of dialects. Di erences in dialect between the speakers in the training database and the target users often leads to degraded recognition performance. For the BBN Hark Hidden Markov Model (HMM) based system, we have already developed a reasonably e ective technique [1] for dealing with multiple US dialects. The solution involves building separate HMM sets for each dialect from representative training speech data. This requires that training speakers be accurately classi ed by dialect, which is di cult to do reliably even by hand. In this paper we describe a recognition based pseudo-automatic scheme for partitioning a pool of US English training speakers into groups, such that the speakers within each group share the same pronunciation characteristics. Our scheme is speechdata driven, and involves using transcript-level word hypotheses generated by a recognizer to partition the pool of training speakers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assimilation of Final Low Back Vowel in Eghlidian Dialect

In this article, the low back vowel /A/ in word-final positions in Eghlidian dialect, one of Persian dialects, is studied. This vowel is represented phonetically as [A], [o] and [@] in different phonetic environments. Therefore many words were collected via interviewing ten native speakers so that these different alternant forms can be accounted for appropriately. Since one of the authors of th...

متن کامل

Fluency and use of segmental dialect features in the acquisition of a second language (French) by English speakers

This study investigates the use of two parameters, fluency and use of segmental dialect features (accent) to rate the overall ability of speakers of French as a second language. A group of ten native English Canadians read a short text of 139 words in French (their second language). Their degree of fluency was established by a combination of the following measures: speech rate (words/min, syll/...

متن کامل

The generalized grave accent in the Sorunda dialect: preliminary observations of three generations

A large Eastern Central Sweden dialect area, including the Sorunda dialect, is traditionally characterized by a generlized grave tonal word accent which, according to informal observations, is in decline. The present paper reports the results of a preliminary auditory analysis of word accent use in three generations of Sorunda speakers. Using recordings of spontaneous speech, it was found that ...

متن کامل

The Status of [h] and [ʔ] in the Sistani Dialect of Miyankangi

The purpose of this article is to determine the phonemic status of [h] and [ʔ] in the Sistani dialect of Miyankangi. Auditory tests applied to the relevant data show that [ʔ] occurs mainly in word-initial position, where it stands in free variation with Ø. The only place where [h] is heard is in Arabic and Persian loanwords, and only in the pronunciation of some speakers who are educated and/or...

متن کامل

Gender and Dialect Bias in YouTube's Automatic Captions

This project evaluates the accuracy of YouTube’s automatically-generated captions across two genders and five dialects of English. Speakers’ dialect and gender was controlled for by using videos uploaded as part of the “accent tag challenge”, where speakers explicitly identify their language background. The results show robust differences in accuracy across both gender and dialect, with lower a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996